AITopics | k-means clustering

Collaborating Authors

k-means clustering

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CAS Condensed and Accelerated Silhouette: An Efficient Method for Determining the Optimal K in K-Means Clustering

Das, Krishnendu, Gupta, Sumit, Kumar, Awadhesh

arXiv.org Artificial IntelligenceJul-14-2025

--Clustering is a critical component of decision-making in today's data-driven environments. Clustering has been widely used in a variety of fields, such as bioinformatics, social network analysis, and image processing. However, clustering accuracy remains a major challenge in large datasets. This paper presents a comprehensive overview of strategies for selecting optimal k in clustering, with a focus on achieving a balance between clustering precision and computational efficiency in complex data environments. In addition, this paper introduces improvements to clustering techniques relating to text and image data to provide insights into better computational performance and cluster validity. The proposed approach is based on the Condensed Silhouette method, a statistical methods like Local Structures, Gap Statistics, Class-Consistency Ratio and Cluster Overlap Index(CCR-COI) based algorithm to calculate the best value of K for K-Means Clustering the data. The results of comparative experiments show that the proposed approach achieves up to 99% faster execution times on high-dimensional datasets while retaining both precision and scalability, making it highly suitable for real-time clustering needs or scenarios demanding efficient clustering with minimal resource utilization. Clustering is a critical component of unsupervised machine learning, with the K -means algorithm being particularly favored due to its straightforwardness, speed, and ability to be easily understood. Nonetheless, a major difficulty lies in accurately identifying the best number of clusters, K, especially with expansive and high-dimensional datasets where it is crucial to strike an effective balance between computational efficiency and accuracy.

artificial intelligence, dataset, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2507.08311

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Information Technology (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Reviews: k-Means Clustering of Lines for Big Data

Neural Information Processing SystemsJan-24-2025, 04:36:11 GMT

The authors consider the problem of clustering a set of lines in R d. The goal is to minimize the k-means objective: given n lines L in R d find the best set of k points c1,...,ck in R d so as to minimize sum_{l in L} min_{ci} dist(ci, l) 2. This a clean, nicely motivated problem. The authors provide a coreset construction (namely a small size summary of the input so that any alpha-approximation for the summary yields an alpha(1 epsilon)-approximation for the entire input). This implies the first (1 epsilon)-approximation for the problem with running time nd exp(poly(k)) together with a streaming algorithm with similar running time and memory size 2 {poly(k)} log n. En route to the result the authors provide a bicriteria approximation algorithms: namely a solution that contains O(k (log n dk log k)) centers and whose cost is at most 4 times the cost of the optimal solution with k centers.

algorithm, big data, k-means clustering, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.52)

Add feedback

Reviews: k-Means Clustering of Lines for Big Data

Neural Information Processing SystemsJan-24-2025, 04:36:00 GMT

This paper proposes an PTAS for k-means clustering of lines. The key contribution is the construction of a small coreset, on which brute force algorithms are run. The authors also extend this to the streaming setting. An important computer vision application is used as motivation. The authors should revise the final version to address the issues raised by the reviewers, and make it more readable to researchers in related but not in the exact area.

big data, k-means clustering

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.73)

Add feedback

k-Means Clustering of Lines for Big Data

Neural Information Processing SystemsOct-10-2024, 03:49:09 GMT

The input to the \emph{ k -mean for lines} problem is a set L of n lines in \mathbb{R} d, and the goal is to compute a set of k centers (points) in \mathbb{R} d that minimizes the sum of squared distances over every line in L and its nearest center. This is a straightforward generalization of the k -mean problem where the input is a set of n points instead of lines. We suggest the first PTAS that computes a (1 \epsilon) -approximation to this problem in time O(n \log n) for any constant approximation error \epsilon \in (0, 1), and constant integers k, d \geq 1 . This is by proving that there is always a weighted subset (called coreset) of dk {O(k)}\log (n)/\epsilon 2 lines in L that approximates the sum of squared distances from L to \emph{any} given set of k points. Using traditional merge-and-reduce technique, this coreset implies results for a streaming set (possibly infinite) of lines to M machines in one pass (e.g.

big data, coreset, k-means clustering, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.40)

Add feedback

QoS-Nets: Adaptive Approximate Neural Network Inference

Trommer, Elias, Waschneck, Bernd, Kumar, Akash

arXiv.org Artificial IntelligenceOct-10-2024

In order to vary the arithmetic resource consumption of neural network applications at runtime, this work proposes the flexible reuse of approximate multipliers for neural network layer computations. We introduce a search algorithm that chooses an appropriate subset of approximate multipliers of a user-defined size from a larger search space and enables retraining to maximize task performance. Unlike previous work, our approach can output more than a single, static assignment of approximate multiplier instances to layers. These different operating points allow a system to gradually adapt its Quality of Service (QoS) to changing environmental conditions by increasing or decreasing its accuracy and resource consumption. QoS-Nets achieves this by reassigning the selected approximate multiplier instances to layers at runtime. To combine multiple operating points with the use of retraining, we propose a fine-tuning scheme that shares the majority of parameters between operating points, with only a small amount of additional parameters required per operating point. In our evaluation on MobileNetV2, QoS-Nets is used to select four approximate multiplier instances for three different operating points. These operating points result in power savings for multiplications between 15.3% and 42.8% at a Top-5 accuracy loss between 0.3 and 2.33 percentage points. Through our fine-tuning scheme, all three operating points only increase the model's parameter count by only 2.75%.

accuracy, consumption, operating point, (15 more...)

arXiv.org Artificial Intelligence

2410.07762

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Germany > Saxony > Dresden (0.04)
North America > Puerto Rico > San Juan > San Juan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

CTG-KrEW: Generating Synthetic Structured Contextually Correlated Content by Conditional Tabular GAN with K-Means Clustering and Efficient Word Embedding

Samanta, Riya, Saha, Bidyut, Ghosh, Soumya K., Das, Sajal K.

arXiv.org Artificial IntelligenceSep-3-2024

Conditional Tabular Generative Adversarial Networks (CTGAN) and their various derivatives are attractive for their ability to efficiently and flexibly create synthetic tabular data, showcasing strong performance and adaptability. However, there are certain critical limitations to such models. The first is their inability to preserve the semantic integrity of contextually correlated words or phrases. For instance, skillset in freelancer profiles is one such attribute where individual skills are semantically interconnected and indicative of specific domain interests or qualifications. The second challenge of traditional approaches is that, when applied to generate contextually correlated tabular content, besides generating semantically shallow content, they consume huge memory resources and CPU time during the training stage. To address these problems, we introduce a novel framework, CTGKrEW (Conditional Tabular GAN with KMeans Clustering and Word Embedding), which is adept at generating realistic synthetic tabular data where attributes are collections of semantically and contextually coherent words. CTGKrEW is trained and evaluated using a dataset from Upwork, a realworld freelancing platform. Comprehensive experiments were conducted to analyze the variability, contextual similarity, frequency distribution, and associativity of the generated data, along with testing the framework's system feasibility. CTGKrEW also takes around 99\% less CPU time and 33\% less memory footprints than the conventional approach. Furthermore, we developed KrEW, a web application to facilitate the generation of realistic data containing skill-related information. This application, available at https://riyasamanta.github.io/krew.html, is freely accessible to both the general public and the research community.

ctg-krew, dataset, skillset, (14 more...)

arXiv.org Artificial Intelligence

2409.01628

Country:

South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
North America > United States > Missouri (0.04)
North America > Canada > Nova Scotia > Halifax Regional Municipality > Halifax (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.65)

Add feedback

Transforming Movie Recommendations with Advanced Machine Learning: A Study of NMF, SVD,and K-Means Clustering

Yan, Yubing, Moreau, Camille, Wang, Zhuoyue, Fan, Wenhan, Fu, Chengqian

arXiv.org Artificial IntelligenceJul-11-2024

Keywords-recommendation system; machine learning; Non-groups based on their viewing patterns. Agent Recurrent Deterministic Policy Gradient (MA-RDPG) The proliferation of digital content has necessitated the algorithm, as suggested by Zhao et al., this research aims to development of effective recommendation systems to aid users optimize overall system performance through enhanced in navigating vast amounts of data. This research aims to explore and implement advanced machine Previous studies have extensively explored collaborative learning techniques [1-6] to create a high-performing movie filtering techniques for recommendation systems. The study addresses the following (2001) [13] demonstrated the effectiveness of matrix research questions: What are the most effective machine factorization in uncovering latent user-item interactions. How do et al. (2009) [14] further refined these techniques, leading to these models compare in terms of accuracy and relevance?

arxiv preprint arxiv, recommendation, recommendation system, (9 more...)

arXiv.org Artificial Intelligence

2407.08916

Country:

North America > United States > New York (0.05)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre:

Research Report > New Finding (0.49)
Research Report > Experimental Study (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.87)

Add feedback

A Review of Machine Learning-based Security in Cloud Computing

Babaei, Aptin, Kebria, Parham M., Dalvand, Mohsen Moradi, Nahavandi, Saeid

arXiv.org Artificial IntelligenceSep-9-2023

Cloud Computing (CC) is revolutionizing the way IT resources are delivered to users, allowing them to access and manage their systems with increased cost-effectiveness and simplified infrastructure. However, with the growth of CC comes a host of security risks, including threats to availability, integrity, and confidentiality. To address these challenges, Machine Learning (ML) is increasingly being used by Cloud Service Providers (CSPs) to reduce the need for human intervention in identifying and resolving security issues. With the ability to analyze vast amounts of data, and make high-accuracy predictions, ML can transform the way CSPs approach security. In this paper, we will explore some of the most recent research in the field of ML-based security in Cloud Computing. We will examine the features and effectiveness of a range of ML algorithms, highlighting their unique strengths and potential limitations. Our goal is to provide a comprehensive overview of the current state of ML in cloud security and to shed light on the exciting possibilities that this emerging field has to offer.

algorithm, cloud computing, machine learning, (10 more...)

arXiv.org Artificial Intelligence

2309.04911

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States (0.04)
Asia > China (0.04)

Genre: Overview (1.00)

Industry:

Information Technology > Services (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

clustering an african hairstyle dataset using pca and k-means

Nicrocia, Teffo Phomolo, Adewale, Owolawi Pius, Diana, Pholo Moanda

arXiv.org Artificial IntelligenceMay-25-2023

The adoption of digital transformation was not expressed in building an African face shape classifier. In this paper, an approach is presented that uses k-means to classify African women images. African women rely on beauty standards recommendations, personal preference, or the newest trends in hairstyles to decide on the appropriate hairstyle for them. In this paper, an approach is presented that uses K-means clustering to classify African women's images. In order to identify potential facial clusters, Haarcascade is used for feature-based training, and K-means clustering is applied for image classification.

artificial intelligence, international journal, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2306.06061

Country:

North America > United States > Georgia > Clarke County > Athens (0.14)
Africa > South Africa > Gauteng > Pretoria (0.05)
Europe > Italy (0.04)
(6 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.90)

Add feedback

K-means Clustering Based Feature Consistency Alignment for Label-free Model Evaluation

Miao, Shuyu, Zheng, Lin, Liu, Jingjing, Jin, and Hong

arXiv.org Artificial IntelligenceApr-17-2023

The label-free model evaluation aims to predict the model performance on various test sets without relying on ground truths. The main challenge of this task is the absence of labels in the test data, unlike in classical supervised model evaluation. This paper presents our solutions for the 1st DataCV Challenge of the Visual Dataset Understanding workshop at CVPR 2023. Firstly, we propose a novel method called K-means Clustering Based Feature Consistency Alignment (KCFCA), which is tailored to handle the distribution shifts of various datasets. KCFCA utilizes the K-means algorithm to cluster labeled training sets and unlabeled test sets, and then aligns the cluster centers with feature consistency. Secondly, we develop a dynamic regression model to capture the relationship between the shifts in distribution and model accuracy. Thirdly, we design an algorithm to discover the outlier model factors, eliminate the outlier models, and combine the strengths of multiple autoeval models. On the DataCV Challenge leaderboard, our approach secured 2nd place with an RMSE of 6.8526. Our method significantly improved over the best baseline method by 36\% (6.8526 vs. 10.7378). Furthermore, our method achieves a relatively more robust and optimal single model performance on the validation dataset.

artificial intelligence, dataset, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2304.09758

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Indonesia > Bali (0.04)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Add feedback